Fix(kb_helper):数据库选择的提供商消失,导致的数据库不显示问题 by Li-shi-ling · Pull Request #7244 · AstrBotDevs/AstrBot

Li-shi-ling · 2026-03-31T16:32:33Z

fixes #7218
修理了数据库选择的提供商消失，导致的数据库不显示问题

Modifications / 改动点

在AstrBot\astrbot\core\knowledge_base\kb_helper.py的文件给get_ep和get_rp添加了临时占位的提供商,该提供商实现的方法被调用将抛出报错,通过这样的方法将报错延迟到实际调用时,防止数据库在初始化时报错导致的数据库不显示。

This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

这是一个两个提供商都被删除的数据库,正常显示了

使用占位提供商的后台日志

测试截图

Checklist / 检查清单

[ X ] 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能，已经通过 Issue / 邮件等方式和作者讨论过。

没有新功能

[ X ] 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试，并已在上方提供了“验证步骤”和“运行截图”。

有进行测试

[ X ] 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
/ 我确保没有引入新依赖库，或者引入了新依赖库的同时将其添加到 requirements.txt 和 pyproject.toml 文件相应位置。

没有引入

[ X ] 😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。

没有恶意代码

Summary by Sourcery

Bug Fixes:

Prevent knowledge bases from failing to display when their configured embedding or rerank providers are missing by logging and substituting temporary placeholder providers instead of raising initialization-time errors.

sourcery-ai

Hey - I've found 4 issues, and left some high level feedback:

In get_ep / _ensure_vec_db, when embedding_provider_id is missing you now only log an error but still call get_provider_by_id with a falsy ID and continue; consider either returning a placeholder immediately or keeping the early raise to avoid unexpected behavior downstream.
The temporary provider classes (TempEmbeddingProvider, TempRerankProvider) are defined inside the methods and re-created on each call; consider moving them to module scope (or a shared helper) to avoid duplication and make their behavior easier to maintain.
In TempRerankProvider.rerank the error message mentions Embedding Provider instead of Rerank Provider, which is likely a copy-paste mistake and may confuse debugging when the placeholder is triggered.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `get_ep` / `_ensure_vec_db`, when `embedding_provider_id` is missing you now only log an error but still call `get_provider_by_id` with a falsy ID and continue; consider either returning a placeholder immediately or keeping the early `raise` to avoid unexpected behavior downstream.
- The temporary provider classes (`TempEmbeddingProvider`, `TempRerankProvider`) are defined inside the methods and re-created on each call; consider moving them to module scope (or a shared helper) to avoid duplication and make their behavior easier to maintain.
- In `TempRerankProvider.rerank` the error message mentions `Embedding Provider` instead of `Rerank Provider`, which is likely a copy-paste mistake and may confuse debugging when the placeholder is triggered.

## Individual Comments

### Comment 1
<location path="astrbot/core/knowledge_base/kb_helper.py" line_range="137-138" />
<code_context>

     async def get_ep(self) -> EmbeddingProvider:
         if not self.kb.embedding_provider_id:
-            raise ValueError(f"知识库 {self.kb.kb_name} 未配置 Embedding Provider")
+            logger.error(f"知识库 {self.kb.kb_name} 未配置 Embedding Provider")
         ep: EmbeddingProvider = await self.prov_mgr.get_provider_by_id(
             self.kb.embedding_provider_id,
</code_context>
<issue_to_address>
**issue (bug_risk):** After logging missing embedding_provider_id, the code still proceeds and may pass an invalid ID into get_provider_by_id.

Previously this branch raised and prevented use of an unset `embedding_provider_id`. Now it logs and then calls `get_provider_by_id(self.kb.embedding_provider_id)`, which will be `None` here, risking undefined or inconsistent behavior depending on `get_provider_by_id`’s implementation. Please either short‑circuit when the ID is missing (e.g., return a placeholder provider) or restore an early failure (raise or return a clearly invalid provider) so downstream code never relies on a `None`/invalid ID.
</issue_to_address>

### Comment 2
<location path="astrbot/core/knowledge_base/kb_helper.py" line_range="186-194" />
<code_context>
+                    super().__init__(provider_config, provider_settings)
+                    self.rerank_provider_id = rerank_provider_id
+
+                async def rerank(
+                    self,
+                    query: str,
+                    documents: list[str],
+                    top_n: int | None = None,
+                ):
+                    raise ValueError(
+                        f"无法找到 ID 为 {self.rerank_provider_id} 的 Embedding Provider"
+                    )
+
</code_context>
<issue_to_address>
**issue (typo):** TempRerankProvider error message refers to an Embedding Provider instead of a Rerank Provider.

The `TempRerankProvider.rerank` error message currently says `Embedding Provider` instead of `Rerank Provider`, which can be confusing when both provider types exist. Please update the message (and its localized text) to reference the rerank provider to match the actual failing dependency.

```suggestion
            class TempRerankProvider(RerankProvider):
                def __init__(
                    self,
                    provider_config: dict,
                    provider_settings: dict,
                    rerank_provider_id: str,
                ) -> None:
                    super().__init__(provider_config, provider_settings)
                    self.rerank_provider_id = rerank_provider_id

                async def rerank(
                    self,
                    query: str,
                    documents: list[str],
                    top_n: int | None = None,
                ):
                    raise ValueError(
                        f"无法找到 ID 为 {self.rerank_provider_id} 的 Rerank Provider"
                    )
```
</issue_to_address>

### Comment 3
<location path="astrbot/core/knowledge_base/kb_helper.py" line_range="167-168" />
<code_context>
+                        f"无法找到 ID 为 {self.embedding_provider_id} 的 Embedding Provider"
+                    )
+
+                def get_dim(self) -> int:
+                    return 512
+
+            ep: EmbeddingProvider = TempEmbeddingProvider(
</code_context>
<issue_to_address>
**issue (bug_risk):** Hardcoded embedding dimension 512 in TempEmbeddingProvider may not match the actual FAISS index or configuration.

`TempEmbeddingProvider.get_dim` always returns 512. If the existing FAISS index or config uses a different dimension, this will lead to runtime errors or index inconsistencies when `_ensure_vec_db` accesses the vector DB. Consider deriving the dimension from the knowledge base config (when available) or raising an explicit error instead of returning a hardcoded value.
</issue_to_address>

### Comment 4
<location path="astrbot/core/knowledge_base/kb_helper.py" line_range="136" />
<code_context>

     async def get_ep(self) -> EmbeddingProvider:
         if not self.kb.embedding_provider_id:
-            raise ValueError(f"知识库 {self.kb.kb_name} 未配置 Embedding Provider")
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the placeholder provider classes and error-handling into shared, top-level helpers so the provider methods stay short, consistent, and easier to maintain.

You can keep the new “placeholder provider + logging” behavior while reducing complexity and duplication by:

* Moving the temp providers out of the methods.
* Sharing construction logic.
* Fixing the error message typo.
* Keeping `_ensure_vec_db` simple.

For example:

```python
# module-level helpers

class TempEmbeddingProvider(EmbeddingProvider):
    def __init__(self, provider_id: str) -> None:
        super().__init__({}, {})
        self._provider_id = provider_id

    async def get_embedding(self, text: str) -> list[float]:
        raise ValueError(f"无法找到 ID 为 {self._provider_id} 的 Embedding Provider")

    async def get_embeddings(self, texts: list[str]) -> list[list[float]]:
        raise ValueError(f"无法找到 ID 为 {self._provider_id} 的 Embedding Provider")

    def get_dim(self) -> int:
        return 512


class TempRerankProvider(RerankProvider):
    def __init__(self, provider_id: str) -> None:
        super().__init__({}, {})
        self._provider_id = provider_id

    async def rerank(
        self,
        query: str,
        documents: list[str],
        top_n: int | None = None,
    ):
        raise ValueError(f"无法找到 ID 为 {self._provider_id} 的 Rerank Provider")
```

Then the methods become straightforward, with no inner classes and consistent behavior:

```python
async def get_ep(self) -> EmbeddingProvider:
    if not self.kb.embedding_provider_id:
        logger.error(f"知识库 {self.kb.kb_name} 未配置 Embedding Provider")
        return TempEmbeddingProvider("<missing>")

    ep: EmbeddingProvider | None = await self.prov_mgr.get_provider_by_id(
        self.kb.embedding_provider_id,  # type: ignore[arg-type]
    )
    if not ep:
        logger.error(
            f"无法找到 ID 为 {self.kb.embedding_provider_id} 的 Embedding Provider, 使用占位 Embedding Provider"
        )
        return TempEmbeddingProvider(self.kb.embedding_provider_id)

    return ep
```

```python
async def get_rp(self) -> RerankProvider | None:
    if not self.kb.rerank_provider_id:
        return None

    rp: RerankProvider | None = await self.prov_mgr.get_provider_by_id(
        self.kb.rerank_provider_id,  # type: ignore[arg-type]
    )
    if not rp:
        logger.error(
            f"无法找到 ID 为 {self.kb.rerank_provider_id} 的 Rerank Provider, 使用占位 Rerank Provider"
        )
        return TempRerankProvider(self.kb.rerank_provider_id)

    return rp
```

`_ensure_vec_db` can stay focused on constructing the vector DB, delegating all error semantics to `get_ep` / `get_rp`:

```python
async def _ensure_vec_db(self) -> FaissVecDB:
    if not self.kb.embedding_provider_id:
        logger.error(f"知识库 {self.kb.kb_name} 未配置 Embedding Provider")

    ep = await self.get_ep()
    rp = await self.get_rp()

    vec_db = FaissVecDB(
        doc_store_path=str(self.kb_dir / "doc.db"),
        index_store_path=str(self.kb_dir / "index.faiss"),
        embedding_provider=ep,
        rerank_provider=rp,
    )
    ...
```

This keeps the new behavior (logging + placeholder providers + delayed error), but:

* Removes inner classes from methods.
* Centralizes and deduplicates the temp-provider logic.
* Fixes the “Embedding Provider”/“Rerank Provider” message inconsistency.
* Makes `get_ep` / `get_rp` easier to read and maintain.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76259ed2be

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

gemini-code-assist

Code Review

This pull request modifies the knowledge base helper to use placeholder providers instead of raising immediate exceptions when an embedding or rerank provider is missing. Feedback includes addressing a hardcoded vector dimension of 512 that may cause mismatches, moving the placeholder class definitions to the top level for better efficiency and readability, correcting a copy-paste error in an exception message, and removing redundant error logging in the vector database initialization logic.

Li-shi-ling added 8 commits March 31, 2026 23:17

fix:删除报错处理,改为打印日志

49e79d7

fix:删除报错处理,改为打印日志

a2e1739

fix:添加站位编码器

c34789c

fix:添加站位重排序模型

cd2b873

fix:将站位模型的方法调用改为报错

047037b

fix:Embedding Provider站位模型的git_dim方法改为固定返回

a97db8a

fix:修改日志中的错别字

d9bc58b

fix:执行ruff

76259ed

auto-assign bot requested review from Raven95676 and anka-afk March 31, 2026 16:32

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Mar 31, 2026

sourcery-ai bot reviewed Mar 31, 2026

View reviewed changes

Comment thread astrbot/core/knowledge_base/kb_helper.py

Comment thread astrbot/core/knowledge_base/kb_helper.py

Comment thread astrbot/core/knowledge_base/kb_helper.py

Comment thread astrbot/core/knowledge_base/kb_helper.py

dosubot bot added area:core The bug / feature is about astrbot's core, backend feature:knowledge-base The bug / feature is about knowledge base labels Mar 31, 2026

chatgpt-codex-connector bot reviewed Mar 31, 2026

View reviewed changes

Comment thread astrbot/core/knowledge_base/kb_helper.py

fix:在self.kb.embedding_provider_id不存在时添加临时的占位符

a006cda

gemini-code-assist bot reviewed Mar 31, 2026

View reviewed changes

Comment thread astrbot/core/knowledge_base/kb_helper.py

Comment thread astrbot/core/knowledge_base/kb_helper.py

Comment thread astrbot/core/knowledge_base/kb_helper.py Outdated

Comment thread astrbot/core/knowledge_base/kb_helper.py

fix:修改一个单词错误

b0a6a9a

Li-shi-ling closed this Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix(kb_helper):数据库选择的提供商消失,导致的数据库不显示问题#7244

Fix(kb_helper):数据库选择的提供商消失,导致的数据库不显示问题#7244
Li-shi-ling wants to merge 10 commits intoAstrBotDevs:masterfrom
Li-shi-ling:fix/iss-7218

Li-shi-ling commented Mar 31, 2026 •

edited

Loading

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Li-shi-ling commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Modifications / 改动点

Screenshots or Test Results / 运行截图或测试结果

Checklist / 检查清单

Summary by Sourcery

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Li-shi-ling commented Mar 31, 2026 •

edited

Loading